Extras

This covers additional useful material that we may or may not have time to go over in the course.

Generators

Consider the following code that computes the sum of squared numbers up to N.


In [ ]:
def squared_numbers(n):
    return [x*x for x in range(n)]

def sum_squares(n):
    return sum(squared_numbers(n+1))

sum_squares(20000000)

The code works and is all great, but it has one flaw: it creates a list of all the numbers from 1 to N in memory. If N were large, we would use a lot of extra memory, which might lead to the system swapping or running out of memory.

In this case it is not necessary to create the entire list in memory. The sum function iterates over it's input and only needs the cumulative sum and the next value at a time.

The Python keyword yieldis used to achieve this. Using yield in a statement automatically makes that statement a generator expression. A generator expression can be iterated over like a list, but it only creates new values as they are needed.


In [ ]:
def squared_numbers_alternate(n):
    for x in range(n):
        yield x*x
        
def sum_squares_alternate(n):
    return sum(squared_numbers_alternate(n+1))

sum_squares(20000000)

At this you may wonder, doesn't range() return a list? The short answer is no, but the details are complicated.

Synthesis

Python is often used to process text files, some of which may be quite large. Typically a single row in a text file isn't large, however. The following type of pattern permits one to cleanly read in a file much larger than what would fit memory one line at a time.


In [ ]:
import os
print(os.getcwd())
def grep(fileobject, pattern):
    for index, line in enumerate(fileobject):
        if pattern in line:
            # start indexing from 1 for humans
            # remove the white space at the end
            yield index+1, line.strip()
        
def process_file(input_, pattern):
    with open(input_, "r") as file_:
        for idx, line in grep(file_, pattern):
            print("line {} matches: {}".format(idx, line))
    print("done searching")    

process_file("../data/grep.txt", "test")

Lambdas

One of the paradigms Python supports are lambdas

square = lambda x: x*x

Here the result of the lambda statement, a function object is assigned to the variable square. The statement lambda xdenotes that this lambda statement takes in one parameter, x. The : x*xsay that the return value of the lambda statement is x*x.

It is equivalent to

def square(x):
    return x*x

The beauty of lambda statements is that they don't need to be assigned, but can rather created on the fly.


In [ ]:
square = lambda x: x*x
print(square(4))

(lambda x: x-1).__call__(1)

A typical use case for lambda might be in accessing members of an object in a generic function.

For instance the sort-function takes in a keyword parameter key. It is trivial to do simple operations, like invert a value etc.


In [8]:
my_list = [
    ("apple", 5),
    ("banana", 3),
    ("pear", 10)
]
my_list.sort(key= lambda x: x[1]) #sort by the number
my_list


Out[8]:
[('banana', 3), ('apple', 5), ('pear', 10)]

Lambda has many other uses but those are left as a thought exercise.

Write a mean function that takes in a list and computes a mean of the values accessed by function key that is given as a parameter to the function.

The default key is best left as a function that returns it's parameter, i.e. ``lambda x: x``` .


In [ ]:
def mean(...):
    pass